forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 51
From NVIDIA Megatron-LM for visibility #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
RaymondLi0
wants to merge
4,946
commits into
bigcode-project:multi-query-attention
Choose a base branch
from
NVIDIA:main
base: multi-query-attention
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Expose TE fused MLP with module spec See merge request ADLR/megatron-lm!3384
Co-authored-by: root <[email protected]> Co-authored-by: William Dykas <[email protected]>
Moe inference functional tests See merge request ADLR/megatron-lm!3403
…on H100 Co-authored-by: Oliver Koenig <[email protected]>
ci: Benchmark release tests suite with TE2.2 on H100 See merge request ADLR/megatron-lm!3458
Move data to GPU for TP data processing See merge request ADLR/megatron-lm!3371
…, embedding tying" This reverts commit 5ae21f8.
Signed-off-by: oliver könig <[email protected]>
…nd fix shape mismatch between vision and language transformer Co-authored-by: Gao Deng <[email protected]> Co-authored-by: Gao Deng <[email protected]>
Optimize dummy weight tensors for cudagraph and fix shape mismatch between vision and language transformer See merge request ADLR/megatron-lm!3366
Add --enable-experimental to args. See merge request ADLR/megatron-lm!3377
Co-authored-by: Zijie Yan <[email protected]>
perf(MLA): MLA down proj switch back to TELinear See merge request ADLR/megatron-lm!3281
ci: Retry on network errors See merge request ADLR/megatron-lm!3463
Co-authored-by: Oliver Koenig <[email protected]> Co-authored-by: Guyue Huang <[email protected]> Co-authored-by: Guyue Huang <[email protected]>
Add TE functional tests See merge request ADLR/megatron-lm!3361
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: oliver könig <[email protected]>
…to TELinear" This reverts commit e63aee4.
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: oliver könig <[email protected]>
…ns to CPU Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]>
Added support for offloading Swiglu activations to CPU See merge request ADLR/megatron-lm!3024
Force inference to always gather logits with tensor parallelism See merge request ADLR/megatron-lm!3442
Only run prefill for requests that do not generate tokens See merge request ADLR/megatron-lm!3499
Co-authored-by: Cyril Meurillon <[email protected]> Co-authored-by: Cyril Meurillon <[email protected]>
Enable reruns by default See merge request ADLR/megatron-lm!2739
…lidation feature Co-authored-by: Ye Yu <[email protected]> Co-authored-by: Chenhan Yu <[email protected]>
Clean up ModelOpt finetune scripts and add validation feature See merge request ADLR/megatron-lm!3268
Fix typo in parallel_state expert parallelism See merge request ADLR/megatron-lm!3548
… layers per stage in flexible pp layout
Fix cuda graph logic to determine first/layer layers per stage in flexible pp layout See merge request ADLR/megatron-lm!3505
Remove extra barrier in checkpoint flow See merge request ADLR/megatron-lm!3626
Fix error when TE is not installed See merge request ADLR/megatron-lm!3625
…itializations and associated weight decay skipping.
Adding support for Spike No More embedding initializations and associated weight decay skipping. See merge request ADLR/megatron-lm!3500
MiMo video VLM train example See merge request ADLR/megatron-lm!3543
ci: Retry on `free(): invalid pointer` See merge request ADLR/megatron-lm!3632
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: Keshav Santhanam <[email protected]> Co-authored-by: William Dykas <[email protected]>
Add Dynamic Backend Inference Tests See merge request ADLR/megatron-lm!3475
…ate loading with PP>1 to ensure bit-wise match after saving and loading.
fix(distckpt, moe): Fix distckpt optimizer state loading with PP>1 to ensure bit-wise match after saving and loading. See merge request ADLR/megatron-lm!3394
tests: Fix segfaults (maybe?) See merge request ADLR/megatron-lm!3605
Co-authored-by: liu-zichen <[email protected]>
Fix mrope with context parallel See merge request ADLR/megatron-lm!3603
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.